A multi-class boosting method for learning from imbalanced data

نویسندگان

  • Xiaohui Yuan
  • Mohamed Abouelenien
چکیده

The acquisition of face images is usually limited due to policy and economy considerations, and hence the number of training examples of each subject varies greatly. The problem of face recognition with imbalanced training data has drawn attention of researchers and it is desirable to understand in what circumstances imbalanced data set affects the learning outcomes, and robust methods are needed to maximize the information embedded in the training data set without relying much on user introduced bias. In this article, we study the effects of uneven number of training images for automatic face recognition and proposed a multi-class boosting method that suppresses the face recognition errors by training an ensemble with subsets of examples. By recovering the balance among classes in the subsets, our proposed multiBoost.imb method circumvents the class skewness and demonstrates improved performance. Experiments are conducted with four popular face data sets and two synthetic data sets. The results of our method exhibits superior performance in high imbalanced scenarios compared to AdaBoost.M1, SAMME, RUSboost, SMOTEboost, SAMME with SMOTE sampling and SAMME with random undersampling. Another advantage that comes with ensemble training using subsets of examples is the significant gain in efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

Cost-Sensitive Boosting for Classification of Imbalanced Data

The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in mach...

متن کامل

Multiple Classifier Combination through Ensembles and Data Generation

An ensemble of classifiers consists of a set of individually trained classifiers whose predictions are combined when classifying new instances. The resulting ensemble is generally more accurate than the individual classifiers it consists of. In particular, one of the most popular ensemble methods, the Boosting approach, improves the predictive performance of weak classifiers, which can achieve ...

متن کامل

A novel ensemble method for classifying imbalanced data

The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems...

متن کامل

Classification of SchoolNet Data

SchoolNet Data, mainly educational material, was authored by SchoolNet to make it easy for teachers and learners to find educational resources in various subjects. The task of automatically assigning subject categories to learning materials has become one of the key steps for organizing online information. Since hand-coding classification rules is costly or even impractical, most modern approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJGCRSIS

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2015